Computer-implemented method, system, and storage medium for prefetching in a distributed graph architecture

ABSTRACT

Various embodiments of the present disclosure relate to a computer-implemented method, a system, and a storage medium, where a graph stored in a computing system is logically divided into subgraphs, the subgraphs are stored on different interconnected (or coupled) devices in the computing system, and nodes of the subgraphs include hub nodes connected to adjacent subgraphs. Each device stores attributes and node structure information of the hub nodes of the subgraphs into other devices, and software or hardware prefetch engine on the device prefetches attributes and node structure information associated with a sampled node. A prefetcher on a device interfacing with the interconnected (or coupled) devices may further prefetch attributes and node structure information of nodes of the subgraphs on other devices. A traffic monitor is provided on an interface device to monitor traffic. When the traffic is small, the interface device prefetches node attributes and node structure information.

CROSS-REFERENCE TO RELATED APPLICATION

This patent application claims priority to China Patent Application No.202110701976.1 filed Jun. 24, 2021 by Wei HAN et al., which is herebyincorporated by reference in its entirety.

TECHNICAL FIELD

The present disclosure relates to the field of computers, andspecifically, to a computer-implemented method, a system, and anon-transitory computer-readable storage medium.

BACKGROUND

A graph (graph) is a type of data structure or database that is storedand executed by a computing system and used to model a set of objectsand connections (relationships) between the objects in the set. Theobjects are represented as nodes (or vertices) in the graph connected orlinked through edges. A characteristic or attribute of an object isassociated with a node used to represent the object.

A graph can be used to identify dependency, cluster, similarity, match,category, flow, cost, centrality, and the like in a large dataset.Graphs are used in various types of applications. The various types ofapplications widely include but are not limited to graph analysis andgraph neural networks (Graph Neural Networks, GNN for short), and morespecifically, include applications such as an online shopping engine,social networking, a recommendation engine, a mapping engine (mappingengine), failure analysis, network management, and a search engine.Unlike in applications for facial recognition, where the sampled nodes(pixels) close to each other may be grouped, in the foregoing types ofapplications, the sampled nodes may be separated from each other by aplurality of hops (hop) (that is, two sampled nodes are spaced apart bya plurality of other nodes), and access may be performed randomly orirregularly.

Graphs allow for faster retrieval and navigation of complex hierarchiesthat are difficult to model in relational systems. A large number ofprocessing operations associated with graphs include a graph traversaloperation, such as pointer chasing which reads a node to determine oneor more edges, where the identified edge or edges point to and areconnected to one or more other nodes, the one or more other nodes may inturn be read to determine corresponding other edges, and so on.

Graph data generally includes node structure information and attributes.The node structure information may include, for example, information foridentifying a node (for example, a node identity, or a node identifier,which may be simply referred to as a node ID) and information foridentifying a neighbor node of the node. The attributes may includefeatures or properties of an object and values of these features orproperties. The object is represented by the node, and the features orproperties of the object are associated with the node used to representthe object. For example, if the object represents a person, its featuresmay include the person's age and gender, and in this case, theattributes also include a value to characterize age and a value tocharacterize gender.

A size of a graph is on the order of terabytes (terabytes). A graph maycontain billions of nodes and trillions of edges. Therefore, a graph canbe divided into a plurality of subgraphs, and the plurality of subgraphscan be distributed on a plurality of devices, that is, a large graph canbe divided into a plurality of smaller subgraphs stored in differentdevices.

A node sampling phase is one of main causes of graph-related delays.Each node and edge have an associated performance cost, and therefore,accessing data in a large graph may result in very high overallcommunication cost (for example, latency caused and bandwidth consumed).Because information transmission between devices increases a delay, adelay in a distributed graph also increases accordingly.

It would be beneficial to reduce a delay associated with a large graph(especially a distributed graph).

SUMMARY

Embodiments of the present disclosure provide a solution to resolve theforegoing problem. In general, the embodiments of the present disclosuredescribe methods and systems for prefetching in a distributed grapharchitecture.

More specifically, in some embodiments of the present disclosure, agraph stored in a computing system is logically divided into a pluralityof subgraphs, where the plurality of subgraphs are stored on a pluralityof different interconnected (or coupled) devices in the computingsystem, and nodes of the subgraphs include hub nodes (hub node)connected to adjacent subgraphs.

In some embodiments of the present disclosure, each interconnected (orcoupled) device stores attributes and node identifiers (identifier, IDfor short) of the hub nodes of the plurality of subgraphs on otherinterconnected (or coupled) devices. In addition, a software or hardwareprefetch engine on the device prefetches attributes and a nodeidentifier associated with a sampled node.

Further, in some embodiments of the present disclosure, a prefetcher ofa device connected (or coupled) to the interconnected (or coupled)devices can prefetch attributes, node identifiers and other nodestructure information of nodes of a subgraph on any interconnected (orcoupled) device to another interconnected (or coupled) device thatrequires or may require the node attributes and node structureinformation. In some embodiments of the present disclosure, a trafficmonitor is provided on an interface device to monitor traffic, and whenthe traffic is small, the interface device prefetches node attributes.

According to a first embodiment of the present disclosure, acomputer-implemented method is provided, including: accessing aplurality of hub nodes in a graph, where the graph includes a pluralityof nodes and is logically divided into a plurality of subgraphs, thesubgraph includes at least one of the hub nodes, and each of the hubnodes in the subgraph separately connects the subgraph to anothersubgraph in the plurality of subgraphs; storing attributes and nodestructure information associated with hub nodes of a first subgraph inthe plurality of subgraphs in a second device, where the second deviceis further configured to store information of a second subgraph in theplurality of subgraphs; and storing attributes and node structureinformation associated with hub nodes of the second subgraph in a firstdevice, where the first device is further configured to storeinformation of the first subgraph.

In some embodiments, the attributes and node structure informationassociated with the hub nodes of the first subgraph and the attributesand node structure information associated with the hub nodes of thesecond subgraph include one or more of the following attributes and nodestructure information: a corresponding attribute value, a correspondingnode identifier, and a corresponding node structure.

In some embodiments, the method further includes: prefetching theattributes and node structure information associated with a hub node ofthe first subgraph into a prefetch buffer of the first device when a gapbetween the hub node of the first subgraph and a root node of the firstsubgraph is a single hop.

In some embodiments, the method further includes: prefetching theattributes and node structure information associated with a hub node ofthe second subgraph into a prefetch buffer of the first device when ahub node of the first subgraph is sampled, a gap between the hub node ofthe first subgraph and a root node of the first subgraph is a singlehop, and a gap between the hub node of the first subgraph and the hubnode of the second subgraph is a single hop.

In some embodiments, the method further includes: acquiring nodeidentifiers of a plurality of nodes adjacent to a root node that isadjacent to a first hub node, sampling at least one subset of the nodescorresponding to these node identifiers, and prefetching attributes ofthe nodes in the sampled subset into a prefetch buffer of the firstdevice.

In some embodiments, the method further includes: prefetching attributesand node structure information associated with one of the plurality ofnodes into a buffer of a third device, where the third device is coupledto the first device and the second device.

In some embodiments, the method further includes: using the third deviceto monitor traffic, where when a measured value of the traffic meets athreshold, the prefetching is performed.

In some embodiments, in response to a request from the first device, theprefetching includes: acquiring node identifiers of a plurality of nodesadjacent to a second hub node on the second device; sampling at leastone subset of the nodes corresponding to these node identifiers; andextracting attributes of the nodes in the subset into the buffer of thethird device.

According to a second embodiment of the present disclosure, a system isprovided, including: a processor; a storage unit connected (or coupled)to the processor; and a plurality of interconnected (or coupled) devicesconnected (or coupled) to the storage unit, where the plurality ofinterconnected (or coupled) devices include a first device and a seconddevice, the first device includes a memory and at least one buffer, andthe second device includes a memory and at least one buffer; where thefirst device is configured to store information of nodes of a firstsubgraph of a graph, the second device is configured to storeinformation of nodes of a second subgraph of the graph, and the graphincludes a plurality of subgraphs; the nodes of the first subgraphinclude a first hub node, the nodes of the second subgraph include asecond hub node, and the first hub node and the second hub node areconnected to each other through an edge; and the first device storesattributes and node structure information associated with the second hubnode, and the second device stores attributes and node structureinformation associated with the first hub node.

In some embodiments, the first device includes an access engine, and theaccess engine is configured to: prefetch node identifiers of a pluralityof nodes adjacent to a root node that is adjacent to the first hub node,sample at least one subset of the nodes corresponding to the nodeidentifiers, and acquire attributes of the nodes in the subset.

In some embodiments, the attributes and node structure informationassociated with the first hub node and the attributes and node structureinformation associated with the second hub node include one or more ofthe following attributes and node structure information: a correspondingattribute value, a corresponding node identifier, and a correspondingnode structure.

In some embodiments, when a gap between the first hub node and a rootnode of the first subgraph is a single hop, the attributes and nodestructure information associated with the first hub node are prefetchedinto a prefetch buffer of the first device.

In some embodiments, when the first hub node is separated from a rootnode and sampled, and a gap between the first hub node and the secondhub node is a single hop, the attributes and node structure informationassociated with the second hub node are prefetched into a prefetchbuffer of the first device.

In some embodiments, the system further includes: a third deviceconnected (or coupled) to the first device and the second device, wherethe third device includes a prefetch buffer, and the third deviceprefetches attributes and node structure information associated with oneof the plurality of nodes into a prefetch buffer.

In some embodiments, the third device further includes a trafficmonitor, and when traffic measured by the traffic monitor meets athreshold, the third device prefetches the attributes and node structureinformation associated with the node into the prefetch buffer.

In some embodiments, the third device is configured to: in response to arequest from the first device, acquire node identifiers of a pluralityof nodes adjacent to the second hub node on the second device, sample atleast one subset of the nodes corresponding to the node identifiers, andextract attributes of the nodes in the subset into the prefetch bufferof the third device.

In some embodiments, the first device includes a field programmablelogic gate array, the second device includes a field programmable logicgate array, and the third device includes a memory-over-fabric connected(or coupled) to the first device and the second device.

According to a third embodiment of the present disclosure, anon-transitory computer-readable storage medium is further provided,including computer-executable instructions stored thereon, where thecomputer-executable instructions include: a first instruction, used toaccess a graph, where the graph includes a plurality of subgraphs, theplurality of subgraphs include a first subgraph and a second subgraph,the first subgraph includes a first node set having a first hub node,and the second subgraph includes a second node set having a second hubnode, and the first subgraph and the second subgraph are connectedthrough an edge connecting the first hub node and the second hub node; asecond instruction, used to store attributes and node structureinformation associated with the second hub node in a first device, wherethe first device further stores information associated with the firstsubgraph; and a third instruction, used to store attributes and nodestructure information associated with the first hub node in a seconddevice, where the second device further stores information associatedwith the second subgraph.

In some embodiments, the non-transitory computer-readable storage mediumfurther includes: a fourth instruction, used to prefetch the attributesand node structure information associated with the first hub node into aprefetch buffer of the first device when a gap between the first hubnode and a root node of the first subgraph is a single hop.

In some embodiments, the non-transitory computer-readable storage mediumfurther includes: a fifth instruction, used to prefetch the attributesand node structure information associated with the second hub node intoa prefetch buffer of the first device when the first hub node issampled, a gap between the first hub node and a root node of the firstsubgraph is a single hop, and a gap between the first hub node and thesecond hub node is a single hop.

In some embodiments, the non-transitory computer-readable storage mediumfurther includes: a sixth instruction, used to acquire node identifiersof a plurality of nodes adjacent to a root node that is adjacent to thefirst hub node; a seventh instruction, used to sample at least onesubset of the nodes corresponding to these node identifiers; and aneighth instruction, used to prefetch attributes of the nodes in thesubset into a prefetch buffer of the first device.

In some embodiments, the non-transitory computer-readable storage mediumfurther includes: a ninth instruction, used to prefetch attributes andnode structure information associated with one of the plurality of nodesinto a buffer of a third device, where the third device is coupled tothe first device and the second device.

In some embodiments, the non-transitory computer-readable storage mediumfurther includes: a tenth instruction, used to use the third device tomonitor traffic, and prefetch attributes and node structure informationassociated with the node into the buffer of the third device when ameasured value of the traffic meets a threshold.

Therefore, according to the embodiments of the present disclosure,latency associated with an operation of transmitting information betweena plurality of interconnected (or coupled) devices is eliminated,thereby reducing an overall communication cost of the computing system.In addition to communication cost reduction, resources of the computingsystem are utilized more efficiently.

Those of ordinary skill in the art will appreciate the foregoingobjective, other objectives, and advantages provided by variousembodiments of the present disclosure after reading the followingdetailed description of the embodiments shown in the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings are incorporated in this specification andform a part of this specification, where the same/similar referencenumerals depict same/similar elements. The accompanying drawingsillustrate some embodiments of the present disclosure, and together withthe detailed description, serve to explain principles of the presentdisclosure.

FIG. 1 illustrates a schematic diagram of an example distributed grapharchitecture of graphs that are stored on and executed by a computingsystem according to some embodiments of the present disclosure;

FIG. 2A illustrates a schematic block diagram of components in anexample computing system according to some embodiments of the presentdisclosure;

FIG. 2B illustrates a schematic diagram of a mapping relationshipbetween subgraphs of an example distributed graph and devices in acomputing system according to some embodiments of the presentdisclosure;

FIG. 3 illustrates a schematic block diagram of selected elements orselected components of a device for storing and computing subgraphsaccording to some embodiments of the present disclosure;

FIG. 4 illustrates a schematic diagram of elements of two adjacentsubgraphs in a distributed graph according to some embodiments of thepresent disclosure;

FIG. 5 illustrates a schematic block diagram of an interface deviceconnected (or coupled) to a device for storing and computing subgraphsaccording to some embodiments of the present disclosure; and

FIG. 6 illustrates a schematic flowchart of a computer-implementedmethod according to some embodiments of the present disclosure.

DETAILED DESCRIPTION

Reference will now be made in detail to various embodiments of thepresent disclosure, examples of which are illustrated in theaccompanying drawings. While the disclosure is described in conjunctionwith these embodiments, it should be understood that it is not intendedto limit the present disclosure to these embodiments. On the contrary,the present disclosure is intended to cover alternatives, modificationsand equivalents as may be included within the spirit and scope of thepresent disclosure as defined by the appended claims. In addition, inthe following detailed description, numerous specific details are setforth in order to provide a thorough understanding of the presentdisclosure. However, it should be understood that the present disclosuremay be practiced without these specific details. In other instances,well-known methods, procedures, components and circuits have not beendescribed in detail so as not to obscure the present disclosure.

Some portions of the detailed descriptions which follow are presented interms of procedures, logic blocks, processing and other symbolicrepresentations of operations on data bits within computer memory. Thesedescriptions and representations are the means used by those skilled inthe data processing arts to most effectively convey the substance oftheir work to others skilled in the art. In this application, aprocedure, logic block, process, or the like, is conceived to be aself-consistent sequence of steps or instructions leading to a desiredresult. The steps are those requiring physical manipulations of physicalquantities. Usually, although not necessarily, these quantities take theform of electrical or magnetic signals capable of being stored,transferred, combined, compared, and otherwise manipulated in acomputing system. It has proven convenient at times, principally forreasons of common usage, to refer to these signals as transactions,bits, values, elements, symbols, characters, samples, pixels, or thelike.

It should be borne in mind, however, that all of these and similar termsare to be associated with the appropriate physical quantities and aremerely convenient labels applied to these quantities. Unless otherwisespecifically stated as apparent from the following discussions, it isappreciated that throughout the present disclosure, discussionsutilizing terms such as “accessing”, “prefetching”, “sampling”,“sending”, “writing”, “reading”, “dividing”, “requesting”, “storing”,“recording”, “transferring”, “selecting”, refer to actions and processes(for example, the method shown in FIG. 6 ) of a device or computingsystem or similar electronic computing device or system (for example,the systems shown in FIG. 2A, FIG. 2B, FIG. 3 , and FIG. 5 ). Acomputing system or similar electronic computing device manipulates andtransforms data represented as physical (electronic) quantities withinmemories, registers or other such information storage, transmission ordisplay devices.

Some elements or embodiments described herein may be discussed in ageneral context of computer-executable instructions residing on aspecific form of computer-readable storage medium (for example, programmodules) executed by one or more computers or other devices. By way ofexample and not limitation, computer-readable storage media may includenon-transitory computer storage media and communication media.Generally, program modules include routines, programs, objects,components, data structures, and the like, for performing particulartasks or implementing particular abstract data types. The functionalityof the program modules may be combined or distributed as desired invarious embodiments.

Computer storage media include volatile and nonvolatile, removable andnon-removable media implemented in any method or technology for storageof information (for example, computer readable instructions, datastructures, program modules, or other data). Computer storage mediainclude, but are not limited to, a double data rate (Double Data Rate,DDR for short) memory, a random access memory (Random Access Memory, RAMfor short), a static random access memory (Static Random Access Memory,SRAM for short), or a dynamic random access memory (Dynamic RandomAccess Memory, DRAM for short), a read only memory (Read Only Memory,ROM for short), an electrically erasable programmable read only memory(Electrically Erasable Programmable Read Only Memory, EEPROM for short),a flash memory (Flash Memory, such as Solid State Drive (SSD for short))or other memory technologies, a compact disk read only memory (CompactDisk Read Only Memory, CD-ROM for short), a digital versatile disk(Digital Versatile Disk, DVD for short) or other optical storage, amagnetic cassette (Magnetic Cassette), a magnetic tape (Magnetic Tape),a magnetic disk storage (Magnetic Disk Storage) or other magneticstorage devices, or any other medium which can be used to store thedesired information and which can be accessed to retrieve theinformation.

Communication media may embody computer executable instructions, datastructures, and program modules, and include any information deliverymedia. By way of example and not limitation, communication media includewired media such as a wired network or direct-wired connection, andwireless media such as acoustic, radio frequency (Radio Frequency, RF),infrared, and other wireless media. Combinations of any of the above mayalso be included within the scope of computer readable media.

FIG. 1 illustrates a schematic diagram of an example distributed grapharchitecture of graphs that are stored on and executed by a computingsystem (for example, computing systems shown in FIG. 2A, FIG. 2B, FIG. 3, and FIG. 5 ) according to some embodiments of the present disclosure.In the example shown in FIG. 1 , a graph 100 is logically divided intothree communities (community) or subgraphs 102, 104, and 106. However,the number of subgraphs is not limited thereto. The graph 100 includes aplurality of nodes (each node is represented as a square in FIG. 1 ).

Typically, a community is a subset of nodes in a graph, so that thenumber of edges within the community is greater than the number of edgeslinking the community to the rest of the graph. The graph 100 may belogically divided into communities or subgraphs by using communitydetection algorithms such as, but not limited to: Kernighan-Lin(Kernighan-Lin, K-L for short) algorithm; Girvan-Newman (Girvan-Newman)algorithm; multi-level (multi-level) algorithm; leading eigenvector(leading eigenvector) algorithm; and Louvain (Louvain) algorithm.

Each node in the graph 100 represents an object, and attributes andstructure information of the object is associated with a noderepresenting the object. Attributes of a node/object may include one ormore features or properties of the object (for example, if the objectrepresents a person, the features may include the person's age and/orgender), and attribute data may include values of the one or morefeatures (for example, a numeric value to characterize the person's ageand a flag to identify the person's gender). Structure information of anode/object may include, for example, information for identifying a node(for example, a node identifier) and information for identifying othernodes connected to the node.

Through one or more hub nodes, each subgraph is connected to adjacentsubgraphs through corresponding edges. For example, in FIG. 1 , asubgraph 102 includes hub nodes 121, 122, and 123, which are connectedto hub nodes 161 and 162 of a subgraph 104 through corresponding edges.The hub nodes in the subgraph 102 are similarly connected to hub nodesin a subgraph 106 and vice versa, hub nodes in the subgraph 104 aresimilarly connected to the hub nodes in the subgraph 106.

Neighboring or adjacent subgraphs (for example, the subgraphs 102 and104) are connected to each other through a single hop (single hop), forexample, an edge 110 connects a hub node 121 to a hub node 161. Nodes ofsubgraphs in the graph 100 are also connected to each other throughedges.

FIG. 2A illustrates a block diagram of components in an examplecomputing system 200 according to some embodiments of the presentdisclosure. The computing system 200 may be configured to store andexecute a distributed graph architecture of the graph 100 in the exampleshown in FIG. 1 .

In the example shown in FIG. 2A, the computing system 200 includes aplurality of central processing units (Central Processing Units, CPU forshort), for example, including a CPU 202. In an embodiment of thepresent disclosure, each CPU includes or is coupled to a correspondinggraphics processing unit (Graphics Processing Unit, GPU for short), suchas a GPU 204. In some embodiments, each CPU is connected (or coupled) toa corresponding top-of-rack (Top-Of-Rack, TOR for short) switch (such asa TOR switch 206) via a network interface card (Network Interface Card,NIC for short) (such as a NIC 208).

In this embodiment of the present disclosure, each CPU is also connected(or coupled) to a corresponding device or integrated circuit, forexample, connected (or coupled) to devices 211, 212, 213 . . . N (thatis, devices 211-N). In the embodiment shown in FIG. 2A, each of thedevices 211-N is a field programmable gate array (Field ProgrammableGate Array, FPGA for short).

In this embodiment of the present disclosure, the devices 211-N areinterconnected (or coupled) in a particular manner, so that among thesedevices, any device can communicate with any other device, transmit datato any other device, and receive data from any other device. In someembodiments, the devices 211-N are interconnected (or coupled) through afully connected local area network (Fully Connected Local Network, FCLNfor short) 216. In the following description (shown in conjunction withFIG. 3 ), in some embodiments, each of the devices 211-N is connected(or coupled) to an interface device 316, and the interface device 316is, for example, a memory-over-fabric (Memory-Over-Fabric, MoF forshort).

FIG. 2B illustrates a schematic diagram of a mapping relationshipbetween the example subgraphs 102, 104, and 106 to the devices 211-Naccording to some embodiments of the present disclosure. In theseembodiments, each of the devices 211-N stores and computes its ownsubgraph. In an example, the subgraph 102 is stored and computed by thedevice 211, the subgraph 106 is stored and computed by the device 212,and the subgraph 104 is stored and computed by the device 213.

FIG. 3 illustrates a schematic block diagram of selected elements orselected components of a device (for example, a device 211) for storingand computing subgraphs according to some embodiments of the presentdisclosure. As described herein, the device 211 also helps to prefetchnode attributes and node structure information (for example, nodeidentifiers). Configurations and functions of each of other devices 212,213 . . . N (that is, the devices 212-N) shown in FIGS. 2A and 2B aresimilar to those of the device 211. The devices 211-N may includeelements or components other than those shown and described herein, andthe elements or components may be coupled in a way shown in the figureor in other ways.

In an example, the device 211 is described in terms of the functionsperformed by some modules in the device 211. Although the modules aredescribed and illustrated as separate modules, the present disclosure isnot limited thereto. In other words, for example, combinations of thesemodules/functions can be integrated into a single module that performs aplurality of functions.

In the embodiment of FIG. 3 , the example device 211 includes a commandencoder 302 and a command decoder 304 that are coupled to a commandscheduler 306. The device 211 includes or is coupled to a communication(communications) interface 308 (for example, an advanced extensibleinterface, Advanced Extensible Interface) to communicate with otherdevices (for example, a CPU 202) of the system 200 shown in FIG. 2A.

The device 211 is also coupled to a storage unit via a load unit (loadunit, LD unit for short) 310. As described above, the device 211 maystore and compute the subgraph 102. The storage unit includes a memory312 on the device 211 (for example, a DDR memory) for storingattributes, node identifiers, and other node structure information ofnodes in the subgraph 102. The storage unit further includes a mainmemory 314 (for example, a RAM), and the main memory 314 is coupled tothe device 211 and the other devices 212-N.

The device 211 is also coupled to the other devices 212-N via aninterface device 316 (for example, an MoF) to access subgraphs on theother devices. The following further describes the interface device 316with reference to FIG. 5 .

It should be noted that the device 211 shown in FIG. 3 includes aprefetcher 320, one or more buffers 322 with a prefetch buffer 323, aneighbor acquisition module 332, a sample acquisition module 334, anattribute acquisition module 336, and an encoding acquisition module338. The prefetch buffer 323 is configured to store prefetched nodeattributes.

In the exemplary device 211, the prefetcher 320, the prefetch buffer323, the neighbor acquisition module 332, the sample acquisition module334, the attribute acquisition module 336, and the encoding acquisitionmodule 338 constitute elements of an access engine (Access Engine, AxEfor short) implemented on the device 211. The access engine may beimplemented in hardware or software, or a combination of hardware andsoftware.

In an embodiment of the present disclosure, because the number of hubnodes is relatively small, the attributes and node structure information(for example, node identifiers) of all hub nodes in all subgraphs of thegraph 100 (see FIG. 1 ) can be stored by each of the devices 211-N inthe system 200 (see FIG. 2A).

In the subgraphs of the graph 100 (see FIG. 1 ), a node of interest canbe referred to as a root node in this specification. For example, a nodecan be selected and attributes of that node (root node) can be read oracquired.

However, in some cases, not all attributes of the root node may beknown. For example, if an object represented by the root node is aperson, the person's age may be recorded, but the person's gender maynot be recorded. However, by using a community detection algorithm (usedto organize and divide the nodes in the graph 100 into a plurality ofcommunities or a plurality of subgraphs), an unknown attribute of theroot node is deduced or estimated from attributes of adjacent nodes (theadjacent nodes described herein are nodes connected to the root node bya single hop). Optionally, the unknown attribute of the root node can bededuced or estimated based on attributes of nearby nodes (the nearbynodes described herein are nodes connected to the root node by aplurality of hops).

Referring to FIG. 3 , the neighbor acquisition module 332 is configuredto determine and extract from the memory 312 the node identifiers of thenodes adjacent to or near the root node. The node identifiers constitutea relatively small amount of data, and therefore, obtaining those nodeidentifiers consumes only a relatively small amount of system resources(for example, bandwidth).

The sample acquisition module 334 then samples the nodes with the nodeidentifiers identified by the neighbor acquisition module 332. Thesamples obtained from sampling may include all nodes identified by theneighbor acquisition module 332, or only include a subset of thesenodes. For example, a subset of the nodes may be selected randomly orbased on weights assigned to the nodes. The weight of a node may bedetermined, for example, based on a distance of the node from the rootnode, and the distance is measured by the number of hops between thenode and the root node.

Afterwards, the attribute acquisition module 336 acquires from thememory 312 the attributes of the nodes sampled by the sample acquisitionmodule 334. As described above, if the samples obtained through samplingby the sample acquisition module 334 include only a selected subset ofnodes, the amount of data (attributes) obtained is reduced, therebyconsuming fewer system resources.

Next, the encoding acquisition module 338 encodes the acquiredattributes and writes them into a main memory (for example, a RAM 314),where the attributes can be accessed for other processing whennecessary.

In an embodiment of the present disclosure, when a root node in asubgraph stored in the device 211 (see FIG. 3 ) is selected, and theroot node is separated from hub nodes of the subgraph by a single hop(single hop), the device 211 may prefetch the attributes from the memory312 into the buffer 322. In this case, hub nodes of adjacent subgraphsstored on a device (for example, the device 212) adjacent to the device211 are separated from the root node by two hops (two hops). In thiscase, waiting time can be reduced by prefetching the attributes of thehub nodes of the subgraphs on the device 211 into the prefetch buffer323, and prefetching the attributes of the hub nodes of adjacentsubgraphs (for example, stored on the device 212) into the prefetchbuffer 323. Then, if one of the above hub nodes is sampled, itsattributes can be acquired directly from the prefetch buffer 323.

In addition, if one of the above hub nodes is sampled, the device 211may send a request to another device (for example, the device 212)through the interface device 316, to prefetch attributes of nodesadjacent to the hub nodes on the another device.

FIG. 4 illustrates a schematic diagram of elements in two adjacentsubgraphs 102 and 106 in the graph 100 according to some embodiments ofthe present disclosure. In this example, a hub node 415 of the subgraph102 and a hub node 425 of the subgraph 106 are connected by a single hopthrough an edge 405, which means that the hub node 415 of the subgraph102 and the hub node 425 of the subgraph 106 are first-hop neighbornodes to each other. The subgraph 102 includes a node 413 adjacent tothe hub node 415, which means that the node 413 and the hub node 415 areconnected by a single hop through an edge 414. In this example, the node413 and the hub node 425 are connected by two hops through the edge 414and the edge 405. Similarly, the subgraph 106 includes a node 423adjacent to the hub node 425. In this example, the node 413 is not a hubnode, and may be referred to as an internal node or a non-hub node inthe present disclosure. The subgraph 102 is stored and computed by thedevice 211 (which may be referred to as a first device herein), and thesubgraph 106 is stored and computed by the device 212 (which may bereferred to as a second device herein).

In some embodiments, when the node 413 is selected as the root node, theattributes of the hub node 415 may be prefetched into the prefetchbuffer 323 of the first device 211. Similarly, when the node 423 isselected as the root node, the attributes of the hub node 415 may beprefetched into the prefetch buffer of the second device 212. Inaddition, when the node 413 is selected as the root node, the attributesof the hub node 425 may be prefetched into the prefetch buffer 323 ofthe first device 211. Similarly, when the node 423 is selected as theroot node, the attributes of the hub node 425 may be prefetched into theprefetch buffer of the second device 212.

Through the prefetching operations as described above, waiting timeassociated with transferring information between the interconnected (orcoupled) devices can be reduced or eliminated in many cases, therebyreducing an overall communication cost of the computing system.

FIG. 5 illustrates a schematic block diagram of an interface device 316(for example, an MoF) according to some embodiments of the presentdisclosure. In addition to the elements or components shown anddescribed below, the device 316 may include other elements orcomponents. The device 316 may be coupled to the devices 211-N in thecomputing system 200 shown in FIG. 2A, and may be referred to as a thirddevice in the present disclosure.

The device 316 is coupled to the device 211, and optionally, to anaccess engine (AxE) on the device 211. In the example shown in FIG. 5 ,the device 316 includes: an assembler (assembler) 502, configured toassemble and route data packets for data transfers between the devicesin the computing system 200; a dissembler (dissembler) 504, configuredto disassemble the data packets and arbitrate the data transfers; and ascheduler 506. The device 316 further includes an interface and otherfunctional modules 508 to perform other functions, such as flow controland error detection, and interfaces with the devices 212-N.

In some embodiments, the device 316 further includes a prefetcher and aprefetch buffer 510. The prefetcher may prefetch attributes and nodestructure information (for example, node identifiers) of nodes adjacentto hub nodes and store these attributes in the prefetch buffer 510. Inthe above example (for example, the example shown in FIG. 4 ), when thehub node 425 on the second device 212 is sampled by the first device211, the nodes (for example, the node 423) adjacent to the hub node mayalso be sampled by the first device 211. In this case, in someembodiments, in response to sampling the hub node 425 on the seconddevice 212, attributes of one or more nodes adjacent to the hub node areprefetched into the prefetch buffer 510 on the device 316, and then thedevice 211 can access and read the prefetch buffer 510 to acquire theattributes of the one or more nodes.

More specifically, in conjunction with FIG. 3 and FIG. 4 , in someembodiments, for example, when the hub node 415 or the hub node 425 issampled by the device 211, the neighbor acquisition module 332 of thedevice 211 sends a request to the device 316 for acquiring nodeidentifiers, which are recorded on the second device 212, of nodes (forexample, the node 423) adjacent or near to the hub node 425. In responseto a request from the sample acquisition module 334, nodes (with nodeidentifiers identified by the neighbor acquisition module 332 of thedevice 211, for example, the node 423) are sampled and attributes ofthese nodes are stored in the prefetch buffer 510 on the device 316. Ifa node is sampled, attributes of the node stored in the prefetch buffer510 are sent by the device 316 to the first device 211. To be specific,in some embodiments, the operation of prefetching the attributes of thenode is performed in response to a request from the attributeacquisition module 336 of the device 211.

Further, other more aggressive prefetching schemes can be used, whereattributes can be prefetched from nodes that are more than a single hopor two hops away from the root node.

The prefetching through the device 316 further reduces waiting timeassociated with the information transfer between the interconnected (orcoupled) devices, thereby further reducing the overall communicationcost of the computing system.

In some embodiments, the device 316 further includes a traffic monitor516. For example, the traffic monitor 516 monitors traffic (for example,bandwidth consumed) in the device 316. For example, when the traffic islow (for example, below a threshold), node attributes are prefetchedinto the prefetch buffer 510 on the device 316. Therefore, in additionto communication overheads reduction, resources of the computing systemare utilized more efficiently.

The embodiments of the present disclosure significantly reduce aworst-case/long-tail delay. Compared with a reference scheme in whichthe hub nodes are not stored on the devices 211-N, simulation results ofthe embodiments of the present disclosure show that: as described above,the delay can be reduced by 47.6% by storing the hub nodes of all thedevices 211-N in each device; in further combination of the improvementwith the prefetch buffers and prefetch schemes on the devices 211-N, thedelay can be reduced by 49.5%; and in combination of the foregoingimprovement with the prefetch buffer and prefetch scheme on the device316, the delay can be reduced by 74.8%.

FIG. 6 illustrates a schematic flowchart 600 of a computer-implementedmethod according to some embodiments of the present disclosure. All orsome of the operations represented by the blocks in flowchart diagram600 can be implemented as computer-executable instructions residing on aspecific form of non-transitory computer-readable storage medium, andmay be executed by a computing system such as the computing system 200shown in FIG. 2A. In one example discussed below, a graph is logicallydivided into: a first subgraph stored and computed by a first device,and a second subgraph stored and computed by a second device. Thisscheme can be easily extended to more than two subgraphs and devices.

In block 602 shown in FIG. 6 , with reference to FIG. 4 , the hub nodes415 and 425 in the graph 100 are accessed. The graph is logicallydivided into a plurality of subgraphs including a plurality of hubnodes, and the plurality of hub nodes are used to connect the subgraphs,as described above.

In block 604, attributes and node structure information (for example,node identifiers) associated with the hub nodes (for example, the node415) of the first subgraph 102 are stored in the second device 212. Thesecond device 212 is further configured to store information of thesecond subgraph 106, and store attributes and node structure information(for example, node identifiers) associated with the hub nodes (forexample, the node 425) of the second subgraph into the first device 211,where the first device 211 is further configured to store information ofthe first subgraph. The attributes include one or more node features orproperties and their respective values, and as discussed above, the nodestructure information includes node identifiers and other structureinformation.

In block 606, node attributes are prefetched. Examples of the prefetchschemes are described above and below. However, the embodiments of thepresent disclosure are not limited to those examples.

In some embodiments, with reference to FIG. 3 , node identifiers of thenodes adjacent to the root node (the root node being a root nodeadjacent to the hub node, for example, the node 413 adjacent to the hubnode 415) are determined or acquired, at least one subset of these nodesis sampled, and attributes of the nodes in the sampled subset areprefetched into the prefetch buffer 323 of the first device 211.

In some embodiments, when a gap between a hub node of the first subgraphand the root node (for example, the node 413) of the first subgraph is asingle hop (for example, through the edge 414), the attributesassociated with the hub node (for example, the node 415) of the firstsubgraph 102 are prefetched into the prefetch buffer 323 of the firstdevice 211.

In some embodiments, when a hub node (for example, the node 415) of thefirst subgraph 102 is sampled, a gap between the hub node of the firstsubgraph 102 and a root node (for example, the node 413) of the firstsubgraph is a single hop, and a gap between the hub node of the firstsubgraph and the hub node of the second subgraph is a single hop (forexample, through the edge 405), the attributes associated with the hubnode (for example, the node 425) of the second subgraph 106 areprefetched into in the prefetch buffer 323 of the first device 211.

In some embodiments, with reference to FIG. 5 , attributes and nodestructure information (for example, node IDs) associated with nodes (forexample, internal nodes or non-hub nodes) are prefetched into theprefetch buffer 510 of the third device 316 (for example, an MoF), wherethe third device 316 communicates (is coupled) with the first device 211and the second device 212. More specifically, in these embodiments, inresponse to a request from the first device, node identifiers, which areon the second device, of nodes adjacent to the second hub node 425 aredetermined or acquired; at least one subset of these nodes is sampled;and the attributes of the sampled nodes are prefetched into the buffer510 of the third device 316. In some embodiments, the device 316 isfurther configured to monitor traffic. In this case, if a measured valueof traffic meets a threshold, the device 316 performs the prefetching.

Please note that the meaning of the word “coupled” within variousembodiments of the present disclosure indicates that elements can bedirectly connected or those elements can be indirectly coupled togetherby one or more intervening elements.

While the foregoing disclosure has set forth various embodiments usingspecific block diagrams, flowcharts, and examples, each block diagramcomponent, flowchart step, operation, and/or component described and/orillustrated herein may be individually configured using a wide range ofconfigurations implemented individually and/or collectively. Inaddition, any disclosure of components contained within other componentsshould be considered as an example because many other architectures canbe used to implement the same functionality.

Although the subject matter has been described in a language specific tostructural features and/or methodological acts, it should be understoodthat the subject matter defined in the present disclosure is notnecessarily limited to the specific features or acts described herein.Rather, the specific features and acts described herein are disclosed asexamples of implementing the present disclosure.

Embodiments of the present disclosure are thus described. While thepresent disclosure has included the description of specific embodiments,the present disclosure should not be construed as limited by theseembodiments, but should be construed in accordance with the appendedclaims.

What is claimed is:
 1. A computer-implemented method, comprising:accessing a plurality of hub nodes in a graph, wherein the graphcomprises a plurality of nodes and is logically divided into a pluralityof subgraphs, the subgraph comprises at least one of the hub nodes, andeach of the hub nodes in the subgraph separately connects the subgraphto another subgraph in the plurality of subgraphs; storing attributesand node structure information associated with the hub nodes of a firstsubgraph in the plurality of subgraphs in a second device, wherein thesecond device is further configured to store information of a secondsubgraph in the plurality of subgraphs; storing attributes and nodestructure information associated with the hub nodes of the secondsubgraph in a first device, wherein the first device is furtherconfigured to store information of the first subgraph; and prefetchingthe attributes and node structure information associated with a hub nodeof the first subgraph into a prefetch buffer of the first device when agap between the hub node of the first subgraph and a root node of thefirst subgraph is a single hop.
 2. The method according to claim 1,further comprising: prefetching the attributes and node structureinformation associated with a hub node of the second subgraph into theprefetch buffer of the first device when a hub node of the firstsubgraph is sampled, a gap between the hub node of the first subgraphand a root node of the first subgraph is a single hop, and a gap betweenthe hub node of the first subgraph and the hub node of the secondsubgraph is a single hop.
 3. The method according to claim 1, within theattributes and node structure information associated with the hub nodesof the first subgraph and the attributes and node structure informationassociated with the hub nodes of the second subgraph comprise one ormore of the following attributes and node structure information: acorresponding attribute value, a corresponding node identifier, and acorresponding node structure.
 4. The method according to claim 1,further comprising: acquiring node identifiers of a plurality of nodesadjacent to a root node that is adjacent to a first hub node, samplingat least one subset of the nodes corresponding to these nodeidentifiers, and prefetching attributes of the nodes in the sampledsubset into the prefetch buffer of the first device.
 5. The methodaccording to claim 1, further comprising: prefetching attributes andnode structure information associated with one of the plurality of nodesinto a buffer of a third device, wherein the third device is coupled tothe first device and the second device.
 6. The method according to claim5, further comprising: using the third device to monitor traffic,wherein when a measured value of the traffic meets a threshold, theprefetching is performed.
 7. The method according to claim 5, wherein inresponse to a request from the first device, the prefetching includes:acquiring node identifiers of a plurality of nodes adjacent to a secondhub node on the second device; sampling at least one subset of the nodescorresponding to the node identifiers; and extracting attributes of thenodes in the subset into the buffer of the third device.
 8. The methodaccording to claim 1, wherein the first device comprises a first fieldprogrammable logic gate array and the second device comprises a secondfield programmable logic gate array.
 9. A system, comprising: aprocessor; a storage unit coupled to the processor; and a plurality ofcoupled devices coupled to the storage unit, wherein the plurality ofcoupled devices comprise a first device and a second device, the firstdevice comprises a memory and at least one buffer, and the second devicecomprises a memory and at least one buffer; wherein the first device isconfigured to store information of nodes of a first subgraph of a graph,the second device is configured to store information of nodes of asecond subgraph of the graph, and the graph comprises a plurality ofsubgraphs; the nodes of the first subgraph comprise a first hub node,the nodes of the second subgraph comprise a second hub node, and thefirst hub node and the second hub node are connected to each otherthrough an edge; and the first device stores attributes and nodestructure information associated with the second hub node, and thesecond device stores attributes and node structure informationassociated with the first hub node; wherein the first device comprisesan access engine, and the access engine is configured to: prefetch nodeidentifiers of a plurality of nodes adjacent to a root node that isadjacent to the first hub node, sample at least one subset of the nodescorresponding to the node identifiers, and acquire attributes of thenodes in the subset.
 10. The system according to claim 9, wherein theattributes and node structure information associated with the first hubnode and the attributes and node structure information associated withthe second hub node comprise one or more of the following attributes andnode structure information: a corresponding attribute value, acorresponding node identifier, and a corresponding node structure. 11.The system according to claim 9, wherein when a gap between the firsthub node and a root node of the first subgraph is a single hop, theattributes and node structure information associated with the first hubnode are prefetched into a prefetch buffer of the first device.
 12. Thesystem according to claim 9, wherein when the first hub node isseparated from a root node and sampled, and a gap between the first hubnode and the second hub node is a single hop, the attributes and nodestructure information associated with the second hub node are prefetchedinto a prefetch buffer of the first device.
 13. The system according toclaim 9, further comprising: a third device coupled to the first deviceand the second device, wherein the third device comprises a prefetchbuffer, and the third device prefetches attributes and node structureinformation associated with one of the plurality of nodes into aprefetch buffer.
 14. The system according to claim 13, wherein the thirddevice further comprises a traffic monitor, and when traffic measured bythe traffic monitor meets a threshold, the third device prefetches theattributes and node structure information associated with the node intothe prefetch buffer.
 15. The system according to claim 13, wherein thethird device is configured to: in response to a request from the firstdevice, acquire node identifiers of a plurality of nodes adjacent to thesecond hub node on the second device, sample at least one subset of thenodes corresponding to the node identifiers, and extract attributes ofthe nodes in the subset into the prefetch buffer of the third device.16. The system according to claim 13, wherein the first device comprisesa first field programmable logic gate array, the second device comprisesa second field programmable logic gate array, and the third devicecomprises a memory-over-fabric coupled to the first device and thesecond device.
 17. The system according to claim 9, wherein the firstdevice comprises a first field programmable logic gate array and thesecond device comprises a second field programmable logic gate array.18. A non-transitory computer-readable storage medium comprisingcomputer-executable instructions stored thereon, wherein thecomputer-executable instructions comprise: a first instruction, used toaccess a graph, wherein the graph comprises a plurality of subgraphs,the plurality of subgraphs comprise a first subgraph and a secondsubgraph, the first subgraph comprises a first node set having a firsthub node, and the second subgraph comprises a second node set having asecond hub node, and the first subgraph and the second subgraph areconnected through an edge connecting the first hub node and the secondhub node; a second instruction, used to store attributes and nodestructure information associated with the second hub node in a firstdevice, wherein the first device further stores information associatedwith the first subgraph; a third instruction, used to store attributesand node structure information associated with the first hub node in asecond device, wherein the second device further stores informationassociated with the second subgraph; and a fourth instruction, used toprefetch the attributes and node structure information associated withthe first hub node into a prefetch buffer of the first device when a gapbetween the first hub node and a root node of the first subgraph is asingle hop.
 19. The non-transitory computer-readable storage mediumaccording to claim 18, further comprising: a fifth instruction, used toprefetch the attributes and node structure information associated withthe second hub node into the prefetch buffer of the first device whenthe first hub node is sampled, a gap between the first hub node and aroot node of the first subgraph is a single hop, and a gap between thefirst hub node and the second hub node is a single hop.
 20. Thenon-transitory computer-readable storage medium according to claim 18,further comprising: a fifth instruction, used to acquire nodeidentifiers of a plurality of nodes adjacent to a root node that isadjacent to the first hub node.